Analyzing Neurotransmitter Receptors & Protein Sequences

Ákos Kimpián, Joshua Lembeck, Elvin Kalinowski, Mikel Garcia Amez, Marcel Skumantz

2025-02-12

Introduction

  • @Mikel add information here

Data Set info

Channels

Material & Methods

- Dirtying + Cleaning
- EDA
- PCA
- Prediction

Dirtying and Cleaning

More data

EDA

Prediction preprocessing

  • AA composition variables (aa_*) as features
  • Receptor classes using pattern-based annotation of Protein_Name as target
    • Cys-loop receptors
    • Ionotropic glutamate receptors
    • Other ionotropic receptors

PCA

  • Examine structure before modelling
  • using tidymodels
  • PC1 <-> PC2 scatter
#...
pca_rec <- recipe(~., data = prediction_df) |>
  update_role(Protein_ID, Receptor_class, 
              new_role = "id") |>
  step_normalize(all_predictors()) |>
  step_pca(all_predictors())
#...

Predictive Modeling

  • Stratified \(80\)/\(20\) train–test split to maintain class balance
  • Random Forest classifier with \(1000\) trees
  • Basic Metrics and Mean Decrease Gini (MDG) as feature importance
#...
pca_rec <- recipe(~., data = prediction_df) |>
  update_role(Protein_ID, Receptor_class, 
              new_role = "id") |>
  step_normalize(all_predictors()) |>

rf_spec <- rand_forest(trees = 1000) |>
  set_engine("randomForest") |>
  set_mode("classification")
#...

Results

Correlation Matrix

:::::

PCA

:::::

Results of receptor family classification

Length and weight correlation analysis

Discussion (Joshua)

Biological Interpretation

  • Role of the AAs for specific structural characteristics

Limitations and Future Directions

  • Success of the analysis?
  • What could be explored more in detail